All the steps to create our final datasheet can be found in data_cleaning.R.

Importing the data

In this section, we import the data for further analysis. We only imported the necessary datasets. These were: - Fahrzeuge_OEM1_Typ11.csv - Fahrzeuge_OEM1_Typ12.csv - Fahrzeuge_OEM2_Typ21.csv - Fahrzeuge_OEM2_Typ22.csv - Bestandteile_Fahrzeuge_OEM1_Typ11.cs - Bestandteile_Fahrzeuge_OEM1_Typ12.cs - Bestandteile_Fahrzeuge_OEM2_Typ21.cs - Bestandteile_Fahrzeuge_OEM2_Typ22.cs - Bestandteile_Komponente_K1BE1.csv - Bestandteile_Komponente_K1BE2.csv - Einzelteil_T02.txt - Zulassungen_alle_Fahrzeuge.csv - Geodaten_Gemeinden_v1.2_2017-08-22_TrR.csv

Additional to the provided datasheets we needed tWO more data sources to get additional geographical information for the Shiny-App: - georef-germany-postleitzahl.csv - germany_municipalities.shp

The steps for this process were as followed:

Data Preparation

After the loading of the Dataframes, the data needed to be prepared for further analysis. Those were the steps for the data preparation:

This section of the document prepares the data for subsequent analysis and ensures data consistency and quality for further investigation.

Creation of the final data set

This final dataframe was written to the CSV file named “Final_dataset_group_17.csv.”. The unused dataframes were removed.

Evaluation

Result

The App fulfills the UI-task (light blue, font family and logo). The heatmap and final datasheet is integrated. There exist a box to test a vehicle and download vehicle type specific information.

The final Shiny app
The final Shiny app

AS result we got that ~10% of the provided ~3’000’000 vehicles are damaged. We can see that in the popups: most piecharts have around 10% damaged vehicles. This is also logical because if vehicles were delivered in a region, the proportion of damaged vehicles should be the mean. The damage doesn’t depend on the region. The regions where the largest number of vehicles were delivered have also the largest number of damaged vehicles.

Damaged and functioning vehicles Top 10 municipalities with the most damaged vehicles

The damaged vehicle heatmap
The damaged vehicle heatmap
Zoom on the heatmap that shows the highest values
Zoom on the heatmap that shows the highest values

We kept the following columns in the final datasheet: “Gemeinde”, “ID_Fahrzeug”, “ID_T02”, “ID_Motor”, “Beschaedigt”, “PLZ”, “Longitude”, “Latitude”, “StateCapital” and “waitingTime”. We decided to keep those columns because they prove why we marked a vehicle as damaged (“Beschaedigt” == “ja”) and they prove the waiting time and the location on the heatmap. So without any of those columns we coudn’t get all the needed information.

The final table
The final table